Python: fix(python): unblock connect() when ClientSession or session.initialize() fails#14016
Conversation
…ze() fails When _inner_connect() failed to create a ClientSession or to call session.initialize(), it raised KernelPluginInvalidConfigurationError without first calling ready_event.set(). The connect() method was waiting on that event in a background task, so it would hang indefinitely (until an external timeout fired, e.g. 30 s in PromptFlow). The transport-failure branch already called ready_event.set() before raising, so the pattern was consistent there but missing in the two later branches. Fix: - Add ready_event.set() before raise in the ClientSession creation exception handler. - Add ready_event.set() before raise in the session.initialize() exception handler. - After await ready_event.wait() in connect(), check whether the background task finished with an exception and re-raise it so that callers (including __aenter__) receive the error instead of silently succeeding with a broken state. Fixes microsoft#13414
There was a problem hiding this comment.
Pull request overview
This PR updates the Python MCP connector to address connection hangs when background connection setup fails, while also introducing sampling consent and MCP tool/prompt name conflict handling.
Changes:
- Propagates background
_inner_connect()failures afterready_eventis set. - Adds sampling consent callback support across MCP plugin constructors.
- Skips MCP tools/prompts whose normalized names conflict with existing plugin attributes.
Comments suppressed due to low confidence (1)
python/semantic_kernel/connectors/mcp.py:355
ready_eventis set only after awaiting_exit_stack.aclose(). If cleanup raises while handling a failedsession.initialize(), the background task exits without signaling the event and the caller still hangs inconnect(). The event should be signaled reliably even when cleanup fails.
await self._exit_stack.aclose()
ready_event.set()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # If the background task finished before (or exactly when) ready_event was | ||
| # set, it means it raised an exception on an error path. Re-raise it here | ||
| # so callers always receive the error rather than silently succeeding with a | ||
| # broken connection state. | ||
| if self._current_task.done(): | ||
| exc = self._current_task.exception() | ||
| if exc is not None: | ||
| raise exc |
| await self._exit_stack.aclose() | ||
| ready_event.set() |
| session: ClientSession | None = None, | ||
| kernel: Kernel | None = None, | ||
| request_timeout: int | None = None, | ||
| sampling_consent_callback: SamplingConsentCallback | None = None, |
| if self._current_task.done(): | ||
| exc = self._current_task.exception() | ||
| if exc is not None: | ||
| raise exc |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 94%
✓ Correctness
The core bugfix is correct: adding
ready_event.set()beforeraisein the two missing exception handlers in_inner_connect(ClientSession creation and session.initialize()) makes them consistent with the already-correct transport-failure branch, preventingconnect()from hanging indefinitely. The post-wait checkif self._current_task.done()inconnect()correctly re-raises the exception to the caller, since in CPython's asyncio the task completes (via the raise) before the event loop resumes the waiter scheduled byready_event.set(). The sampling consent callback and name-conflict detection additions are straightforward and correctly implemented. No correctness issues found.
✓ Security Reliability
The PR correctly fixes the
connect()hang bug by addingready_event.set()beforeraisein the two previously-missing exception handlers and re-raising task exceptions inconnect()after the event fires. The new sampling consent callback follows a fail-closed design (denies on exception), and the name-conflict guard prevents a malicious MCP server from overwriting critical plugin attributes viasetattr. No security or reliability issues found in the changed code.
✗ Test Coverage
The PR fixes a real hang bug by adding ready_event.set() in two exception handlers and re-raising background-task exceptions in connect(). New features (sampling consent callback, name conflict detection) have good test coverage. However, the core bug fix itself — the two new ready_event.set() calls and the error re-raise in connect() — has NO dedicated test. The existing test_mcp_plugin_failed_get_session only covers the transport-failure path that was already correct. The PR description explicitly identifies ClientSession-creation and session.initialize() failure as testable paths but no tests were added for them.
✗ Design Approach
The new conflict guard only snapshots
dir(self)once, so it prevents collisions with built-in plugin attributes but still silently allows collisions between MCP functions loaded in different passes. A tool and prompt that normalize to the same local name can still overwrite each other, which means the change does not fully solve the name-shadowing problem it introduces tests for.
Flagged Issues
- The primary bug fix (ClientSession creation failure and session.initialize() failure no longer hanging
connect()) has no dedicated test coverage. The PR description acknowledges these paths are testable by patchingClientSession.__aenter__orClientSession.initializeto raise (following the existingtest_mcp_plugin_failed_get_sessionpattern), but no such tests were added, leaving no regression guard for the exact hang scenario this PR fixes. -
mcp.py:513-515caches reserved names fromdir(self)only once before loading begins, but MCP functions are later installed viasetattr(self, local_name, func)at lines 538 and 553. A tool and prompt that normalize to the samelocal_name(e.g. both becomesafe_tool) will not trigger the conflict guard on the second pass; the later load silently overwrites the earlier one, andKernelPlugin.from_object(kernel_plugin.py:239-247) exports only the surviving attribute, silently dropping a remote capability.
Automated review by nileshpatil6's agents
Description
Fixes #13414
The bug
MCPPluginBase.connect()spawns_inner_connect()as a background task and then waits on aready_event:_inner_connect()has three failure points inside theif not self.session:block:ready_event.set()before raise?get_mcp_client())ClientSession(...)creationsession.initialize()When the MCP server returns HTTP 401/403,
session.initialize()raises an exception. The handler closes the exit stack and re-raisesKernelPluginInvalidConfigurationError, but never callsready_event.set(). Because the exception is trapped inside the background task,connect()keeps waiting onready_eventindefinitely, hanging the caller until an external timeout fires (e.g. the 30-second PromptFlow deadline).The same hang applies when
ClientSessionconstruction itself fails.The fix
Two changes:
_inner_connect: addready_event.set()beforeraisein both theClientSessionandsession.initialize()exception handlers, making them consistent with the already-correct transport-failure branch.connect: afterawait ready_event.wait(), check whether the background task has already finished with an exception and re-raise it. This ensures the error surfaces to the caller (including__aenter__) instead of silently succeeding with a broken connection state.Testing
The existing
test_mcp_plugin_failed_get_sessiontest covers the transport-failure path. The two new paths (ClientSession failure and session.initialize() failure) can be verified with the same pattern used in that test by patchingClientSession.__aenter__orClientSession.initializeto raise.Contribution Checklist